Fluent speech prosody: Framework and modeling

نویسندگان

  • Chiu-yu Tseng
  • Shao-huang Pin
  • Yehlin Lee
  • Hsin-Min Wang
  • Yong-cheng Chen
چکیده

The prosody of fluent connected speech is much more complicated than concatenating individual sentence intonations into strings. Prosody framework and modeling should base on more understanding of both the production and perception of fluent speech. We analyzed speech corpora of read Mandarin Chinese discourses from a top-down perspective on perceived units and boundaries, and consistently identified speech paragraphs of multiple phrases that reflected discourse effect in fluent speech. Subsequent cross-speaker and cross-speaking-rate acoustic analyses of identified speech paragraphs revealed systematic cross-phrase patterns in every acoustic parameter, namely, F0 contours, duration adjustment, intensity patterns, and in addition, boundary breaks. We therefore argue for a higher prosodic node that governs, constrains, and groups phrases to derive speech paragraphs and show how to account for the tune and rhythm characteristic to fluent speech prosody through cross-phrase specifications. A hierarchical multi-phrase framework is constructed to account for the governing effects, with complimentary perceptual evidence. The framework specifies phrasal intonations as subjacent sister constituent subject to higher specifications; while output fluent speech prosody is cumulative results of contributions from every prosodic layer. To test our framework, we further construct a modular prosody model of multiple phrase grouping with 4 corresponding acoustic modules and have begun testing the model with speech synthesis. Finally, we argue that development of unlimited TTS could benefit most appreciably by capturing cross-phrase relationships in prosody modeling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognizing Mandarin Chinese Fluent Speech Using Prosody Information—an Initial Investigation

The aim of the present paper is to demonstrate how prosody information could be used to recognize Mandarin Chinese fluent speech and what the recognized results imply. By applying our hierarchical prosody framework for fluent speech [1, 2] that specifies boundary breaks and boundary information across phrases and group phrases into speech paragraphs, we were able to develop software that automa...

متن کامل

Duration, intensity and pause predictions in relation to prosody organization

Our research group has postulated a perceptually based multiphrase prosody framework for speech paragraphs in fluent speech using corporal analyses. The framework features a prosody hierarchy that organizes phrases and sentences into prosodic groups (PG) in connected speech, and specifies cross-phrase prosodic relationships in the acoustic domains [1, 2]. A corresponding fluent speech prosody m...

متن کامل

Research on dynamic characters of Chinese pitch contours

Chinese is a tone language. For a tone, the characters of its F0 pitch contours will be quite different in the condition of continuant speaking from the isolated speaking. The present researches about the Chinese tone are still centralized on the isolated speaking one, and about tone in fluent speech, there are some statements about the phenomenon of the two-word, threeword, four-word co-readin...

متن کامل

Prosody and phonetic variability: Lessons learned from acoustic model clustering

Most research on the use of prosody in automatic speech processing has focused on F0, energy and duration correlates to prosodic structure. However, there are multiple sources of evidence suggesting that there are spectral correlates as well. This paper presents an analysis of prosodically labeled conversational speech data using acoustic parameters and clustering techniques that are standard i...

متن کامل

Higher Level Organization and Discourse Prosody

This paper addresses higher level organization in discourse prosody. Fluent speech prosody of text reading illustrated higher level speech planning above phrases and prosody segments above intonation units. Adopting a top-down perspective allowed clearer reflection of scope and unit involved. We examined large amount of speech data via a corpus approach, studied read discourse through perceived...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 46  شماره 

صفحات  -

تاریخ انتشار 2005